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We present a study of the properties of network of political discussions on one of the most popular 
Polish Internet forums. This provides the opportunity to study the computer mediated human 
interactions in strongly bipolar environment. The comments of the participants are found to be 
mostly disagreements, with strong percentage of invective and provocative ones. Binary exchanges 
(quarrels) play significant role in the network growth and topology. Statistical analysis shows that 
the growth of the discussions depends on the degree of controversy of the subject and the intensity 
of personal conflict between the participants. This is in contrast to most previously studied social 
networks, for example networks of scientific citations, where the nature of the links is much more 
positive and based on similarity and collaboration rather than opposition and abuse. The work 
discusses also the implications of the findings for more general studies of consensus formation, where 
our observations of increased conflict contradict the usual assumptions that interactions between 
people lead to averaging of opinions and agreement. 
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I. INTRODUCTION 

Most of the studies of social networks concentrate on 
properties of groups formed due to attraction among par- 
ticipants. In such situations the links form between ac- 
tors sharing some similarity, for example common inter- 
est (in the case of scientific collaboration networks or 
cross-linked Internet sites) or likeness of views (in the 
case of political associations) . The networks of social in- 
teractions grow via preferential attachment on a 'rich get 
richer' principle. In most situations, differences and con- 
flicting positions and opinions are viewed as limitations 
and barriers to such network formation. Frequent psy- 
chological reaction to meeting with someone who holds 
an opposing view is not an attempt to convince (phe- 
nomenon assumed widely in the "consensus formation" 
models), but rather cutting off the connection. In face- 
to-face encounters this avoidance limits growth of net- 
works based on contrariness. Perhaps the most developed 
form of such direct 'hate based networks' might be long 
term family or tribal feuds. The advance of modern tech- 
nologies has, however, provided an indirect contact field, 
where it is possible to express contrary views, hate-filled 
reactions and aggressive attacks without the risk of phys- 
ical injury or personal danger. This 'bravery of being out 
of range' allows such networks to form and flourish. In 
this paper we present a study of specific networks that 
benefit and grow thanks to disagreements and hate. 

A good example of such networks is offered by user 
comments to news items published on the Internet - fea- 
ture common to most of the web portals today. The ease 
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of access and anonymity that the Internet has provided 
greatly enhanced the number of people participating in 
such discussions. From the research point of view, such 
discussions are relatively easy to document and can pro- 
vide necessary data for meaningful statistical work. 

For our study we have chosen discussion forums at one 
of the most popular Internet portals and news sites in 
Poland, |http : / / www . gazeta . pi| Specifically, we have 
limited our research to discussions spurred by the Poli- 
tics subsection of the news. Current political situation 
in Poland makes it an almost ideal ground for such a 
study. There is almost clearly bipolar split between the 
two main political factions (Platforma Obywatelska, PO, 
and Prawo i Sprawiedliwosc, PiS). The conflict shown at 
the highest positions of the state is even more visible in 
the group of active readers of Internet portals. In fact 
the participants are probably better characterized as be- 
longing to the anti-PO and anti-PiS groups rather than 
the 'pro' counterparts. 

The reason for choosing this particular forum is the 
fact that while preserving anonymity of the users, it also 
provides relative recognizability of participants. This re- 
sults from the fact that only registered users are allowed 
to post comments, and can be identified by registered 
nicknames. We may assume that participant XYZ in 
one discussion thread is the same person as participant 
XYZ in a different one. Of course this leaves open the 
possibility of a single real person using multiple Internet 
personalities. However, even with this limited trackabil- 
ity, it is possible to try to find hubs of communication, 
both in the comment writing and in reaction to published 
comments. 

Our goal is to find if the change in motivations for link- 
ing from positive to negative changes the general network 
properties. 
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II. METHODS 

The data for the study have been gathered using a ded- 
icated program, written for the purpose of loading and 
initial analysis of the discussion threads at the selected 
site. The program performed automated tasks of data 
collection and cleaning and enabled the next step, which 
consisted of assigning political stance to discussion par- 
ticipants and to classification of the comments. This part 
of the analysis, by far most time consuming and cumber- 
some, had to be done by a human, by reading all the 
comments in a thread. 

It should be noted that in almost all cases (with a sin- 
gle exception only), the whole discussions were actually 
linked not to the original news article but to the first 
comment. This is a result of operational process of the 
portal, and the fact that pushing the 'comment' button 
in typical situations links the post not to the original 
source but to the earliest existing post. To avoid spuri- 
ous statistics this phenomenon has been corrected for by 
the program. 

The participants were assigned three possible types 
(called nodeclass) . The first was for commentators whose 
viewpoints were visibly in agreement with one of the two 
factions (nodeclassses A and B). The remaining partici- 
pants, for whom it was impossible to clearly assign posi- 
tion were given nodeclass NA. 

The comments were classified according to the follow- 
ing scheme: 

Agr - comment agrees with the covered material (either 
the original news coverage or the preceding com- 
ment in a thread); 

Dis - comment disagrees with the covered material (ei- 
ther the original news coverage or the preceding 
comment in a thread); 

Inv - comment is a direct invective and personal abuse 
of the previous commentator; 

Prv - provocation - comment is aimed at causing dissent, 
often only weakly related to the topic of discussion; 

Neu - comment is neutral in nature, neither in obvious 
agreement or disagreement; 

Jst - 'just stupid' comment, which is totally unrelated 
to the topic of discussion; 

Swi - comment signifying a switch in participant's posi- 
tion leading to agreement between two previously 
opposing commentators. 

Other works on computer mediated discussions in closed 
communities used different message classification themes, 
for example Jeong [l|, 0| has proposed grouping comments 
by categories such as 'Arg' - argument for a given the- 
sis (corresponding to our Agr category); 'But' - a chal- 
lenge (corresponding to our Dis category), 'Expl' - for 



posts giving explanations, and 'Evid' for posts giving fac- 
tual evidence. In our case, the explanations and evidence 
posts were rather scarce, possibly due to political nature 
of the disputes. We have therefore opted for categoriza- 
tion that reflected the emotional nature of communica- 
tion, rather than factual one. 

Following the process of categorization we performed 
standard analyses typical for network systems. As the 
literature on the subject is very rich we refer here to the 
general overviews, for example Dorogovtsev and Mendes 
3, Albert and Barabasi Newman Newman et al. 
6|. It should be noted that the average size of the net- 
work formed by posts related to a single news item was 
relatively small (from a few tens to a few hundreds of 
comments, thus the statistical spread of results for single 
discussion threads was rather significant. 

In our analysis we have used publicly available pro- 
gram GUESS, developed and maintained by Eytan Adar 
(http : //graphexploration. cond. org/ index. html, 
see also Adar Q, Adar and Miryung Q). 

To understand the data we have developed a com- 
puter simulation model, which has resulted in quantita- 
tively comparable system characteristics, allowing to un- 
derstand the role of the most important factors driving 
the growth of comment networks. Details of the model 
are given in Section llHEl The programs and scripts used 
in analysis are available from the authors on request. 



III. RESULTS 

A. General statistics of discussions and temporal 
dynamics 

The statistical properties of discussion threads depend, 
obviously, on the visibility of the news stories they relate 
to. Some of the news are featured on the portal opening 
page, so one would expect that this should influence the 
number of user comments. Our observations do not con- 
firm these expectations - the advantage of the 'front page 
news' is not significant. The users activity does not fol- 
low editors choices. Within the in the Politics category 
each day, the portal carries between 5 and 20 news items 
each day. On a typical graphics display, a visitor sees 
4-6 most recent news items (although the web page has 
also a 'most commented' section, allowing a short cut to 
older, but popular stories). While the 'screen space' and 
graphical clues give no preferences (order of presentation 
is strictly temporal) , the number of comments spurred by 
each story varies significantly, depending on its content. 

Discussion size distribution shows a definite fat-tail be- 
haviour. In addition to news items that raise no commen- 
tary at all, and weakly commented ones (below 50-70 
comments), there are quite a few mid-sized discussions 
(up to about 200 comments) and occasional extended 
discussions (between 200 and 500 posts). 

The news-related discussions are, by their very nature, 
short-lived. While the portal allows to view and comment 
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on news backdating more than 2 weeks, the comment 
frequency vanishes rapidly with time. Usually, there are 
very few comments later than 24 hours after publication, 
and practically none after 48 hours. In fact numerical 
analysis of the threads shows for many discussions a rea- 
sonably good fit with exponential decay timeline, with 
half-life of between 1 and 4 hours. There are excep- 
tions to this, for example news which gain popularity 
many hours after publication (this happens usually for 
stories published at night and commented during busi- 
ness hours) or stories which get a 'second life' due to a 
quarrel between a few participants. 



B. User and comment statistics 

Our first task was to study the typical properties of 
the network of comments, namely the activity of users 
measured by their indegree and outdegree statistics. It 
should be stressed here that network structure grows by 
addition of unique post-to-post links. Therefore trans- 
lating the post-to-post network to a user-to-user one, by 
necessity, introduces multiple connections between users. 
In our analysis we treat these connections as separate, 
which is reflected in several statistical properties. Out- 
degree k of a user, which corresponds to a number of 
comments a given he or she posts in a discussion (or cu- 
mulatively in many discussions), measures the 'produc- 
tivity'. We have attempted to fit the outdegree distri- 
butions for mid-sized and large discussions (where such 
measurements can be meaningful) with a modified power 
law P(k ) = A(k + c )- a + H(k , m ax - k„). The first 
term has been used previously by Newman et al. Q in 
their analysis of the connectivity of internet sites. The 
additional step function H(k 0>max — k Q ) reflects existence 
of several users with exceptionally high number of posts; 
an explanation of their origin shall be presented in later 
part of the paper. It is interesting to note that for most 
discussions the value of the c a constant is rather small 

(M<1). 

We turn next to user indegree, k%. This measures 
not the author's activity, but the response to his or her 
posts, in some way related to their 'interest' value, or 
to the amount of controversy they raise. We note that 
there are quite a few posts that do not elicit any com- 
ment, i.e. with indegree equal to zero. Here also we 
obtained a reasonably good fit by using modified power 
law P(ki) = A(k t + c,) - " + H(k^ max - ki); however, the 
value of Ci is much larger. The power law exponent a 
for indegree is significantly larger than for the outdegree, 
indicating a much faster drop of the typical popularity 
than of productivity. 

Interestingly, in almost all mid-sized and extended dis- 
cussions we find small groups of participants with un- 
usually high indegree and outdegree values, much above 
the predictions of the power-law. These users are re- 
sponsible for the need for additive step function part in 
the distribution. The explanation of their origin is sim- 



plified by observation that they are in many cases the 
same users. High numbers of the connections result not 
from the random process of preferential attachment (typ- 
ical for internet sites or scientific citations) , but from ex- 
tended exchanges of posts between pairs of users. Need- 
less to say, most of such exchanges are confrontational, 
filled with disagreements and abuse. For this reason we'll 
use the term 'quarrels' to describe them. Moreover, such 
verbal duels are easily visible in the graphical view of 
the comments web page. Because of this visibility, they 
attract additional comments from supporters of each of 
the quarrelling sides. Quarrels increase k Q and fcj of their 
participants and thus change the general degree distribu- 
tions. Typically quarrels longer 5-7 exchanges take only 
between 3 and 7 percent of the total number of comments, 
but we have observed discussion threads where such the 
ratio was much higher, for example 21% of the 220 posts 
in a thread resulted from just two long exchanges. 

Recurrence of user nicknames connected with quarrels 
in various threads has added plausibility to a hypoth- 
esis that they are largely due to the presence of 'duel- 
lists' - users seeking each other's comments and joining 
in the fights. For such users the growth of ki and k a 
should be correlated. To test these ideas, we have per- 
formed cumulative statistical analysis for all participants 
in a set of 58 discussions. This has been done using as- 
sumption that the identity of real participants remains 
fixed to nicknames within the whole scope of the portal. 
Results are quite interesting. Out of almost 2000 users 
there were only a few with high values of indegree (16 
with ki > 50). Similarly, there were only 23 users with 
k a > 50. Eleven users belonged to both these groups. 
The average outdegree was (k a ) = 4.65, while indegree 
(excluding references to the original news items to count 
only post-to-post links) was (ki) = 3.04. 

In addition to duellists in the studied discussions we 
have found a group of hyperactive users specializing in 
abusive comments (known as trolls) who, while publish- 
ing a lot of comments, receive much smaller number of 
replies. For example one user has posted 236 times receiv- 
ing only 51 replies. Although trolls post highly provoca- 
tive comments they are frequently ignored - most users 
seem to know the rule 1 don't feed the troll' '. 

Figure [T] shows the cumulative network topology for 
the 58 analysed discussions. There were 1977 users, 9135 
posts, out of which 3194 were linked directly to news 
items. In the figures we have removed all such links, leav- 
ing only connections between the users. The two views 
focus on outdegree (upper panel) and indegree. Multiple 
connections between pairs of users are colour and width 
coded, to emphasize binary exchanges. We can clearly 
see how a few of the users seem to dominate the whole 
forum. Figure [2] presents the cumulative distributions 
P(ki) and P(k ), as well as correlation between ki and 
k values. The two quantities are highly correlated, with 
the correlation coefficient of 0.85. 

From the texts of the comments we know that in many 
cases the motivation for posting was a drive to achieve 
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FIG. 1: Two views of the topology of the network connecting the users participating in 58 large and mid-size discussions within 
one month. Top panel: size of the nodes corresponds to outdegree of the user. Bottom panel: size of the node corresponds 
to indegree. Links width reflects the number of communications between the users - binary exchanges are clearly identifiable. 
Some users have been identified by their nicknames. This allows to identify notorious trolls (such as wrojoz and koloratural), 
who have many posts but relatively few responses, and controversy leaders, such as tuskomatolek and junkier (who have more 
responses that the posts). Despite the fact that almost 2000 users have participated in the discussions, only a few of them 
dominate the exchanges, by their posting activity and by the concentration of responses, such as kraliklll. A perfect example 
of a user whose participation in discussions is motivated by negation and abuse of a particular opponent is given by rooboy. 
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Indegree ki Outdegrcc k Indegree ki 



FIG. 2: Cumulative indegree and outdegree distributions for 58 Politics mid-size and large discussions over a period of 30 days. 
The third panel shows correlation between ki and k , with two squares indicating the notorious trolls, i.e. individuals posting 
a lot of comments but getting only a few answers. The triangles indicate the most active - and at the same time popular 
users, who could be called 'controversy leaders', receiving significantly more comments than they post. To be able to show the 
posts that have not resulted in any comment (with indegree equal zero) on the log-log scales we have artificially shifted them 
to ka = 0.1. 



popularity (or at least notoriety). It is interesting to com- 
pare this notion with the observations of Huberman et al. 
[H| , who identified the drive to achieve fame and visibil- 
ity as one of the distinct factors determining the number 
of posts on YouTube. The authors have shown that 'the 
productivity exhibited in crowds our cing exhibits a strong 
positive dependence on attention. Conversely, a lack of 
attention leads to a decrease in the number of videos up- 
loaded and the consequent drop in productivity, which in 
many cases asymptotes to no uploads whatsoever.' 1 This 
is supported in our case by the fact that the identities 
of the most productive and most commented on partici- 
pants, summed over the set of threads are highly corre- 
lated (third panel in Figure [5]) . We observe a crowd of 
'one-comment' participants and a few popular and pro- 
lific ones. Moreover, the duellists recognize each other 
and tend to join in the sub-threads with remarks of ' Oh, 
it's you again, you ***P % which, of course, spur new 
rounds of abuse. 

An interesting psychological observation is the exis- 
tence of impersonators of 'famous' commentators. They 
choose a nickname that is on the first glance identical to 
the original one, for example by adding unobtrusive parts 
to the user name, such as changing from XYZ to XYZ. - 
which often goes unnoticed. This is the most aggressive 
form of trolling. In most cases the views of the origi- 
nal user and the impostor are radically different. The 
troll's intention is to create chaos and confusion, as an 
unsuspecting reader often finds comments with radically 



different views or even exchanges between apparently the 
same participant, quarrelling with himself. 



C. Comment classification 

In addition to rather simple counting of links between 
posts and users we wanted to goal was to study the con- 
tent and tone of these comments. Our intention was to 
check if the forum provided any chance of reaching a con- 
sensus, or even of decreasing the level of disagreement. 
This part of the evaluation required human evaluation, 
and the results of assignment of each comment to a given 
class are obviously less objective. In some cases we admit 
to being unable to classify a post, but in most cases the 
task was rather straightforward. 

Due to extremely time consuming nature of the task, 
we have chosen 20 threads from the whole set of 58 full 
discussions. They contained between 20 and 250 posts. 

Table U presents percentages of various types of com- 
ments between the users (i.e. omitting the classification 
of comments addressed to the source messages) . The rea- 
son for this omission was to decouple our analysis from 
the individual responses to the news article, which has 
served mostly to distinguish the nodeclass value. Ag- 
gressive posts (Dis, Prv, Inv) took almost 75% of all 
communications between users - and we should remem- 
ber that provocative posts directed to the source news 
story also add to the 'discussion temperature'. Agree- 
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Intra-faction Inter-faction Factions-NA Intra-NA 





(A-A,B-B) 


(A-B) 


(A-NA, B-NA) (NA-R 


Agr 


16.9% 


0.6% 


2.5% 


0.6% 


Dis 


2.1% 


32.8% 


11.1% 


1.2% 


Inv 


0.2% 


17.1% 


3.1% 


0.7% 


Prv 


1.1% 


2.7% 


1.9% 


0.2% 


Neu 


1.1% 


0.8% 


1.7% 


0.2% 


Jst 


0.4% 


0.0% 


0.4% 


0.1% 


Swi 


0.1% 


0.0% 


0.2% 


0.1% 



TABLE I: Statistics of comment type between various groups 
of users (two identified factions A and B and neutral or 
unidentifiable class NA). 

merits between factions were extremely rare, and are only 
slightly more frequent between those users with declared 
opinion (nodeclass A,B) and the unidentified ones (NA 
nodeclass). 

From detailed analysis of the discussions it is clear 
that sub-threads related to neutral posts usually died 
much faster than those due to confrontational or abusive 
ones. Long chains of invective-invective and invective- 
disagreement are very frequent, while exchanges between 
users of the same group agreeing with each other are 
usually shorter, rarely extending beyond 3-4 consecutive 
posts. 



D. Factual and emotional content considerations 

To see if these features are indeed characteristic for 
the strongly polarized political forum, we have gathered 
similar data for two other topics: sport and science. One 
would expect comparable level of emotions to be present 
for a sport column, while much lower level for science 
related news. As we can see from Fig. [31 in all three 
cases there is a main group of discussions, with size dis- 
tribution P(L) falling roughly in power law, but there 
are discussions with very high number of posts L. We 
are interested in the possible origins of such threads. 

We note first that to our surprise, the sport forum has 
shown much faster fall with increasing thread size L, as 
well as much higher proportion of news items that do 
not attract any comment. This is most likely due to the 
fact that sports reporting consists of many items cover- 
ing all disciplines (from soccer through tennis to NBA) 
and the interests of readers would be distributed over the 
disciplines. During the time we have been gathering our 
data a few major sports events took place (e.g Australian 
Open tournament, Handball World Championships), in 
which Polish participants were expected to be successful, 
and where high emotional reactions were present. In- 
deed, these were the news stories that resulted in high 
user participation, equalling in size those of the polit- 
ical forum. However, the topology of sport discussion 
networks of comments was different from the political 
ones. For example, the largest discussion, shortly after a 



dramatic win by the Polish team in the handball cham- 
pionships, involved 249 participants and 336 posts. But 
203 users have posted only a single comment, attached to 
the source message and expressing their joy. The longest 
exchange involved just four posts. Upon reflection, the 
sport forum provides almost perfect contrast to the pol- 
itics: it is not dominated by two factions and the moods 
and opinions of participants are usually in sync. 

The main topic of our study, political comments with 
high degree of conflict, show a very different behaviour. 
There are significantly fewer news stories without any 
comment. Most of the stories with small response consist 
of weakly connected posts, which relate directly to the 
news source. But, as the size grows, the proportion of ex- 
changes, duels and provocative output by trolls increases 
strongly. What is important, the proportion of disagree- 
ments is very high and the aggressive behaviour (abuse 
and provocations) increases even more strongly with the 
size of the discussion. Only in this forum we found large 
proportion of mid-sized (L > 100) discussions, fuelled by 
a mixture random, preferentially attached comments and 
quarrels. 

Compared to sport discussions on science were at the 
other extreme. For the most part they are very short and 
rather dull, with practically no network structure. But, 
from time to time, a strictly scientific topic is translated 
by the users to one of highly loaded subjects. Such was 
the case of pre-natal research which was discussed from 
religious/ethical point of view. Story about details of last 
Ice Age was discussed with all the emotions due to cur- 
rent environmental issues. In such cases we have observed 
an even stronger dominance of binary exchanges in the 
general topology. First of all, the number of participants 
was usually much smaller than the number of posts, In 
one case only 12 people 'produced' 215 posts, with 8 of 
them responsible for 208 comments. Binary exchanges 
longer than 8 consecutive posts took 72% of the discus- 
sion. It should be noted that while the users disagreed 
with each other, only a few comments were abusive, and 
many used evidence, references and logical arguments. In 
another, smaller thread, a discussion between two partic- 
ipants consisted of 40 posts (out of a total of 117), while 
in yet another, two users have exchanged 41 posts (out of 
178), most of them rather long and 'scientific' in spirit. 
Thus, we propose that when the topic of a science news 
is received by the readers as related to important world- 
view issue, the chance for localized conflict of pairs of 
users is very high. This is coupled with a natural barrier 
of accessibility (many participants in political discussions 
simply do not look into the science section at all) , so that 
the proportion of the users capable of adding rational ar- 
guments to the discussion is higher. 



E. Computer model and simulations 

To get more detailed insight and control over on the 
relative importance of the processes described above we 
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FIG. 3: Top left panel: Distribution P(L) of discussions sizes L for three forum topics: politics, sport and science. Filled points 
show the number of news items that did not elicit any comments. Lines show power law fits. The large difference of exponent 
for the sport forum is due to much larger number of news items, most of which get no or almost no reaction at all. Bottom left 
panel: average outdegree {k ), for the politics forum as function of L. Right panel: correlation between the (k ) and percentage 
of discussion spent on binary exchanges (quarrels). Points show percentages of quarrels longer than 6 posts and longer than 4 
posts. The data support supposition that large (k ) values are due to extended quarrels between a few participants. 



have constructed a simplified computer model of the com- 
munity and discussion process. We have tried to keep the 
model as simple as possible, to focus on the crucial as- 
pects of the process. In the model discussion participants 
are simulated by agents with the following characteris- 
tics: nodeclass (we have kept only A and B classes, no 
unknowns or neutrals, although they are within the pro- 
gram capabilities) and activity class (high or low level of 
activity) . 

The simulation process is rather typical: agents are se- 
lected randomly, and then 'choose' the post they wish to 
comment on, from the set of earlier posts. Certain pro- 
portion of the agents look directly to the 'source' mes- 
sage. For others we use preferential attachment rules 
for probability of picking the target post. Specifically, 
the chance of choosing a post is proportional to its to- 
tal degree (outdegree of a post is always 1, indegree may 
be quite high). After choosing the target post, the agent 
then decides whether or not to comment on it. The prob- 
ability depends on the activity class and on the node- 



class of the author of target remark. In the simplified 
model, agents of the same class would always agree with 
each other, while agents of different nodeclass would al- 
ways disagree. To describe the tendency favouring nega- 
tive responses, we have assumed that posting disagreeing 
comments is more probable by a factor of 2 than simple 
agreements. Active participants have higher probability 
of responding than passive ones. Simulation is run until 
preassigned number of posts are placed, at which time 
suitable statistics are measured. 

Results of such model are presented Fig. The in- 
degree distribution shows some similarity with observa- 
tions and good agreement with power-law distribution, 
expected for preferential attachment rules. On the other 
hand, outdegree shows significantly faster, exponential 
decrease rather than power-law. This is not surprising, 
as it corresponds to probabilistic choice of agents posting 
the comments. There are no agents with unusually high 
values of indegree and outdegree, nor is there any signif- 
icant correlation between ki and k a . We conclude that 



8 



the model needs modification to be able to describe our 
observations. 

The key enhancement of the model is an additional 
step in the simulation process. Specifically, after the 
agent has posted a comment, the 'author' of the tar- 
get post is 'given a chance' to respond. The probability 
of such response is assumed higher than for a normal 
post (to reflect the situation when an agent might 'feel 
personally interested' in responding). If the response is 
placed, the roles of the agents are reversed, and again a 
chance for counter-response is evaluated. This chain is 
continued until one of the agents 'decides' to quit. The 
exchange between the two is decoupled from the rest of 
simulation and it is possible to derive simple analytical 
formulae for the mean length of such exchange. Proper 
choice of response probabilities can be achieved by com- 
paring the lengths of simulated and observed quarrels, 
and is partially independent measure of agreement be- 
tween simulation and reality. 

When the quarrels are added into the simulations, the 
results become much closer to the observations, as shown 
in Fig. The similarity goes beyond the indegree and 
outdegree distributions and extends to details of the net- 
work structure,, for example for the correlation coeffi- 
cient between ki and k Q . Thus, even though radically 
simplified (no neutral posts, fully polarized opinions of 
agents, simple process), the model yields results surpris- 
ingly close to reality. 

We note that the computer model described above is 
able to reproduce the characteristic results for the three 
mentioned forums, by simple adjustments of probabilities 
of posting and entering into duel. 

One of the reasons for the success of the simulations 
might be the fact that in large part of the analysed dis- 
cussions people do behave like mindless automata. 
Many comments are almost automatic responses to abuse 
- in form of further abuse. Others are canned accusations 
of the supporters of the opposing political faction for be- 
ing liars, thieves, idiots or worse. There are very few 
explanatory or evidence posts (unlike in topical forums 
where many users aim at helping each other). Even if 
such evidence appears, it is immediately, almost auto- 
matically questioned, as coming 'from the other side'. 
In fact, after reading so many posts we could with high 
probability predict the character of the post by looking 
up the names of the participants, without reading the 
discussion. 



IV. DISCUSSION 

A. Citation networks 

One of the topics where network approach has a long 
history of successful use is the analysis of scientific ci- 
tations. On can refer here to Newman fill . H2I [T3 , [T^ |. 
Redner (TBI] , Silagadze [lfjj . Gupta et al. |l7fl and Vazquez 



There are a lot of functional similarities between the 
internet forum discussions and research publication cita- 
tions. In both cases the participants share some common 
interest, although not necessarily the viewpoint or opin- 
ion. In both cases communication is indirect, via links to 
messages or publications, rather than directly between 
people. Also, the lifetime of communication is limited. 
Of course the time-scales are vastly different (hours com- 
pared to years) but this is mitigated by the ease of posting 
a comment and resulting speed of reaction, compared to 
rather slow scientific publication process. 

In addition to these similarities, there are, however, se- 
rious differences. Probably the most important are due 
to attitude. Science is a work of cooperation and even if 
authors hold opposing opinions on some topic, the publi- 
cations usually reflect the common aim of getting closer 
to a true explanation. While it would be untrue to claim 
that there is no personal conflict in science (in private 
or even in public), there is little room for personal an- 
imosity in refereed publications. This lowers the level 
of emotional reactions. Moreover, there is no anonymity 
in scientific publications and due to collaborative nature 
of modern research the authors themselves form a com- 
plex network of interdependence. Lastly, the content of 
scientific publications is dominated by evidence and eval- 
uation, and as depersonalised as possible. 

In summary, we have two social phenomena which 
share some of the technical aspects of indirect, mediated 
communication, while differing strongly in psychological 
base and content. It is quite interesting to look for sim- 
ilarities and differences in the network structures of the 
two cases. 

Scientific citation networks have been shown to ex- 
hibit modified power law statistics, especially for the 
large ki part of the distribution of citations, for exam- 
ple Newman pi|. There are, however, also models us- 
ing different forms of distribution. For example Redner 
|15j has initially postulated that the indegree of cita- 
tion network is well modelled by stretched exponential 
P(ki) ~ exp(— (3k]) for low fcj values. The existence of 
hugely popular papers has been given various explana- 
tions. Some refer to content and importance, pointing 
out the correlation between citation statistics and real 
importance of the paper or the author. There are also ex- 
planations which rely only on statistical properties, such 
as a very simple computer model of 'randomly citing sci- 
entist', which has reproduced remarkably closely the ac- 
tual distribution of frequency of citing (Simkin and Roy- 
chowdhury [U HH, [H, 111) . 

The general shape of indegree and outdegree distribu- 
tions for citation networks and for internet discussions 
are remarkably close. The main difference is much more 
pronounced role of a few highly connected users in our 
case, which we attribute to quarrelling individuals - phe- 
nomenon absent in scientific publications, where such ex- 
changes are relatively rare and procedurally limited to 
a single remark/response cycle. It is interesting to ob- 
serve that the social network structure seems rather sta- 
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FIG. 5: Indegree and outdegree distributions obtained from computer simulations. The third panel shows correlation between 
ki and k . 



ble, regardless of whether the connections are motivated 
by common interests or by spite and hate. 

A very interesting analysis of the first mover advantage 
in scientific publications has appeared recently (Newman 
[24j ) . He has shown that there is significant bias promot- 
ing citations to early papers in a distinct field of research, 
suggesting, tongue-in-cheek, that it is a better strategy 
to write the first paper in the field than to write the best 



one. This bias is, in a sense, related to publication and 
citation mechanisms, not to actual content. But New- 
man's points out that even where the first-mover effect is 
strong, a small number of later papers attract significant 
attention in defiance of advantage of the earlier ones. 

Despite the fact that the motivation for posting is rad- 
ically different, the same phenomenon is observed in the 
network discussions. The reason is again technical. Most 
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of the heated exchanges are related with early posts. This 
is due to the way the discussion is visually fragmented 
into pages containing 100 posts - viewing the later com- 
ments requires more effort. Thus the late responses to 
the original news story are not immediately visible and 
at a disadvantage compared to early ones. Yet, if there 
is an interesting discussion, the reader can follow it by 
pressing the 'next' button, skipping the boundaries be- 
tween cumulative displays. In such situation some posts 
might get high response rate despite their late timing. 



B. Internet networks, blogs and hate groups 

Similar comparison may be attempted between the 
comment networks and the ones formed by Internet web 
sites. Here again we find some resemblance and some 
differences. The most important differences are that web 
sites and their links are much more stable than comments 
- usually more thought is given by the authors when de- 
ciding what their pages should be linked to. Also, these 
links are usually driven by common interest and views. 
One seldom finds links to web sites showing opposite 
viewpoint - web pages offer a good example of a sys- 
tem where 'birds of a feather group together'. The last 
difference might be that generally there is less emotion 
and more content in traditional web pages. 

As in the case of citation networks, despite the differ- 
ences mentioned above, the general statistical properties 
of discussion threads and web site links are remarkably 
universal. For example the modified power-law distri- 
bution of page degree is very similar to our case. The 
discussions share even more common features with blogs, 
where the personal and emotional content is dominant. 
A study of blogging behaviour in strongly polarized en- 
vironment of US 2004 Elections has been published by 
Adamic and Glance [25[ . It has shown a lot of preference 
for strong links between blogs of the same political orien- 
tation - the links between opposing blogs were present, 
but not numerous, limited to about 15% of the total num- 
ber of links. This stands in contract to our observations. 
An explanation that is plausible relies again on the dif- 
ference between the puprose of the two types of commu- 
nication. Blogs, especially election campaign ones, are 
written with the main aim of promoting particular party 
or candidate. The conflict with the opponents, if present, 
is secondary. On the other hand, the large percentage of 
abuse and invectives in the discussion posts whows that 
the main purpose is to vent the emotions and possibly to 
incite hate. Thus the percentage of cross-group links is 
much higher. 

Another detailed study of blogging behaviour by 
Leskovec et al. [26| shows similar power law behaviour 
for the indegree and outdegree distribution, albeit with 
different exponent values. An important difference be- 
tween the two systems is the lack of observed signifi- 
cant correlation between indegree and outdegree for the 
blogs, with fcj and k a correlation coefficient of only 0.16, 



much lower than 0.85 in our networks dominated by bilat- 
eral exchanges. Leskovec et al. [2(| propose a cascading 
model of blog links and provide data on relative proba- 
bility of various patterns of link connections. The binary 
exchanges of our approach which would correspond to 
linear topology of the cascade model are relatively less 
probable in the blog case, where the cascades tend to be 
wide and not deep. 

The discussions studied in this work are by no means 
the only examples of hate present in the vast space 
of the Internet. There are already works which study 
the network structure of 'Hate groups' (Chau and Xu 
(27| . Chau et al. dH). These studies are important for 
us for two reasons. First, they focus on bloggers, who 
enjoy a lot of freedom to express their opinions and emo- 
tions. Second, the authors use networking methods, sim- 
ilar to the ones employed here. The network of users 
is formed through formal subscriptions between blogs 
and through impromptu comments posted to each other. 
This last aspect corresponds directly to our situation. 
While the political views studied by Chau and Xu are 
probably more extreme than the ones of the readers of the 
www.gazeta.pl portal, the emotional reactions seem to 
be as strong. It is quite interesting that the degree distri- 
bution for the giant component of 273 nodes in Chau and 
Xu network exhibits power-law behaviour, P{k) ~ k~ a 
with exponent a ks 1.38. 



C. Implications for consensus formation modelling 

The last conclusion form our observation relates to a 
different domain. One of fast growing fields in socio- 
physics is the study of opinion formation (for recent re- 
view see Castellano et al. j29|). Most models use so called 
'agent based societies' and assume that consensus in a so- 
ciety forms through a series of exchanges between agents. 
Depending on the model, the initial opinions of the agents 
are changed ciS ci ci result of the interactions. Some mod- 
els postulate a form of averaging of opinions towards a 
mean value (for example Deffuant et al. [30l | , Hegselmann 
and Krause [3l(), others use assumption of one of the in- 
teracting agents switching his or her opinion to fit the 
others (Sznajd-Weron and Sznajd [HI). Consensus for- 
mation models have become very popular, unfortunately 
in large part the studies concentrate of mathematical for- 
malisms or Monte Carlo simulations, and not on descrip- 
tions of real-life phenomena. The need of bringing simu- 
lations and models closer to reality has been realized and 
voiced quite a few times (Moss and Edmonds (33|, Ep- 
stein (33| j Sobkowicz [111). 

An interesting result from the present study is that the 
exchanges studied here (voicing of opinions in a quasi- 
anonymous medium) do not lead to consensus for- 
mation at all! If anything, the exchanges lead to in- 
creased rift between the participants. This effect should 
be studied in much details, as it possibly bears strongly 
on the usability of the models of consensus formation. 



11 



On one hand, we could assume that this is a phe- 
nomenon specific to computer mediated interactions, 
with their lack of face-to-face effects of increased respon- 
sibility, shyness, induced submissiveness and even sympa- 
thy. Anonymity and lack of fear of retribution might em- 
bolden the participants and also promote additional mis- 
chievousness (clearly visible in the presence of provoca- 
tive posts). Thus one might assume that the studied 
form of exchanges is an exception to the general rules of 
opinions getting closer as result of interactions. 

But everyday experience shows that also when people 
meet face-to-face, with full use of non-verbal and emo- 
tional communication, the clash of opinions seldom leads 
to an agreement or even weakening of differences. Both 
history and literature are full of examples of undying 
feuds, where acts of aggression follow each other, from 
Shakespearean Verona families to modern political or 
ethnic strife. Encounters and links between holders of 
conflicting opinions often lead to strengthening of oppos- 
ing convictions - in some cases additionally leading to 
severing the links between individuals and groups. 

The observations of the Internet discussions should 
therefore be augmented by sociological data on flesh-and- 
blood conflicts and arguments, and the dynamics of the 
opinion shifts. But even before such studies are done or 
referred to (which the present authors feel is beyond their 
competence) the basic assumptions of the sociophysical 
modelling of consensus formation should be reworked. 
This is a very interesting task, because ostensibly we are 
faced with two incompatible sets of observations: 

• Even in the face of repeated information exchanges, 
with some of them using 'hard data' and evidence 
to support their viewpoint, participants in the In- 
ternet discussions tend to hold to their opinions, 
perhaps even strengthening their resolve with each 
exchange. Within the analysed subset of the dis- 
cussions the conversion of opinion - even a simple 
agreement to a statement from opposing side - was 
virtually absent. Interactions do not seem to lead 
to opinion averaging or switching. 

• Yet, most of the participants do have well defined 
opinions. These must have formed in some way. 
There are studies indicating genetic/biological base 
for some of the political tendencies (Jost et al. 
[H, Alford et al. [13, Amodio et al. [H], Haidt 
and Graham (39j). So perhaps the participants 



in our discussions did have a built-in tendency to 
pick one of the sides of the divide, and to stick 
to it. Regardless of genetic considerations the po- 
litical attitudes are thought to be dependant on 
fairly stable elements, such as childhood environ- 
ment, which again decreases the chances of reach- 
ing a consensus. On the other hand the opinions on 
specific events, laws or people could not be neither 
genetically coded, not due to cultural formation - 
they must be reached quickly. Disputants, at one 
stage or the other become convinced that Mr X. is 
a hero - or a villain. Which means that there must 
be some effective mechanisms of opinion formation. 
Certainly, the Internet discussions do not provide 
such mechanism. The question is - how to describe 
such process in a 'sociophysical' framework, so far 
strongly preoccupied with a model that assumes 
that encounters can and do change our attitudes? 

The persistence of differences of opinions exhibited in 
online discussions studied in this work stands in contrast 
to observations of Wu and Huberman (40l. l4l|. who mea- 
sured a strong tendency towards moderate views in the 
course of time for book ratings posted on Amazon.com. 
However, there are significant differences between book 
ratings and expression of political views. In the first case 
the comments are generally favourable and the voiced 
opinions are not influenced by personal feuds with other 
commentators. Moreover, the spirit of book review is a 
positive one, with the official aim of providing useful in- 
formation for other users. This helpfulness of each of the 
reviews is measured and displayed, which promotes pro- 
sociality and 'good' behaviour. In the case of political 
disputes it is often the reception in one's own commu- 
nity that counts, the show of force and verbal bashing of 
the opponents. The goal of being admired by supporters 
and hated by opponents promotes very different actions 
than in the cooperative activities. For this reason, there 
is little to be gained by a commentator when placing 
moderate, well reasoned posts - neither the popularity 
nor status is increased. 

Our results suggest possible future models of consen- 
sus formation that would take into account not only fac- 
tors leading to convergence of opinions, but also their 
divergence. Nonlinear interplay between these tenden- 
cies might lead to interesting results, perhaps closer to 
actual social situations than it is today. 
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